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Abstract. Phylogenetic networks are used to display the relationship of different species 
whose evolution is not treelike, which is the case, for instance, in the presence of hybridization 
events or horizontal gene transfers. Tree inference methods such as Maximum Parsimony 
need to be modified in order to be applicable to networks. In this paper, we discuss two 
different definitions of Maximum Parsimony on networks, "hardwired" and "softwired", and 
examine the complexity of computing them given a network topology and a character. By 
exploiting a link with the problem Multicut, we show that computing the hardwired parsi- 
mony score for 2-state characters is polynomial-time solvable, while for characters with more 
states this problem becomes NP-hard but is still approximable and fixed parameter tractable 
in the parsimony score. On the other hand we show that, for the softwired definition, ob- 
taining even weak approximation guarantees is already difficult for binary characters and 
restricted network topologies, and fixed-parameter tractable algorithms in the parsimony 
score are unlikely. On the positive side we show that computing the softwired parsimony 
score is fixed-parameter tractable in the level of the network, a natural parameter describing 
how tangled reticulate activity is in the network. Finally, we show that both the hardwired 
and softwired parsimony score can be computed efficiently using Integer Linear Program- 
ming. The software has been made freely available. 



1. Introduction 

In phylogenetics, graphs are used to describe the relationships between different species. 
Traditionally, these graphs are trees, and biologists aim at reconstructing the so-called 'tree of 
life', i.e. the tree of all living species Q]. However, trees cannot display reticulation events such 
as hybridizations or horizontal gene transfers, which are known to play an important role in 
the evolution of certain species [251 113 E E] . In such cases considering phylogenetic networks 
rather than trees is potentially more adequate, where in its broadest sense a phylogenetic 
network can simply be thought of as a graph (directed or undirected) with its leaves labelled 
by species [HI [281 [29] . This implies that tree reconstruction methods, i.e. the methods used 
to infer the best tree from e.g. DNA or protein data, need to be adapted to networks. 

One of the most famous tree reconstruction methods is Maximum Parsimony [8]. While 
this method has been shown to have drawbacks like statistical inconsistency in the so-called 
'Felsenstein zone' [9] , it is still widely used mainly due to its simplicity: Maximum Parsimony 
does not depend on a phylogenetic model and works in a purely combinatorial way. Moreover, 
for a given tree the optimal parsimony score can be found in polynomial time using the well- 
known Fitch algorithm [11] . This problem of finding the optimal parsimony score for a given 
tree is often referred to as the "small parsimony" problem. The "big parsimony" problem, on 
the other hand, aims at finding the most parsimonious tree amongst all possible trees - and 
this problem has been proven to be NP-hard; it is a close relative of the classical Steiner 
Tree problem [TBI 12]. 
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Recent studies have introduced extensions of the tree-based parsimony concept to phylo- 
genetic networks [181 Ell EQ] and a biological case-study was presented in [19]. Basically, 
Maximum Parsimony on networks can be viewed in two ways: If one thinks of evolution as 
a tree- like process (but maybe with different trees for different genes, all of which are rep- 
resented by a single network), one can define the parsimony score of a network as the score 
of the best tree inside the network. The other way of looking at Maximum Parsimony on 
networks is just the same as the Fitch algorithm's view on trees: One can try to find the 
assignment of states to internal nodes of the network such that the total number of edges 
that connect nodes in different states is minimized. While the first concept may be regarded 
as more biologically motivated, the second one is in a mathematical sense the natural ex- 
tension of the parsimony concept to networks. Both concepts of parsimony on networks are 
considered in this manuscript, and we formally introduce them in Section [2] as softwired and 
hardwired parsimony, respectively. 

Given a phylogenetic network and a criterion like Maximum Parsimony, several questions 
come to mind: How hard is it to calculate the parsimony score (both in the hardwired and 
softwired sense) for a given network ("small parsimony" problem)? How hard is it to find the 
best network ("big parsimony" problem)? In the present paper we focus only on the "small 
parsimony" problem, which in the tree case can easily be solved using the Fitch algorithm. For 
networks, Kannan and Wheeler |21j . who introduced the hardwired score, conjectured that the 
hardwired problem would be hard. We show that this problem is indeed NP-hard and APX- 
hard (Section [3j Corollary [2]) whenever characters employing more than two states are used, 
but we also show that it is polynomial-time solvable for binary characters (Corollary [TJ. In 
Section 2.1 we also analyse the behaviour of their ExtendedFitch algorithm, showing that 
it does not compute the optimal hardwired parsimony score and that it does not approximate 
the softwired parsimony score well. 

For softwired parsimony we show in Section |4j Theorem [2j that the problem is NP-hard 
even for binary characters, and we additionally show that NP-hardness cannot be overcome 
by considering only so-called binary tree-child time-consistent networks (Theorem pi). In fact, 
we show that it is not only difficult to compute the softwired parsimony score exactly, but that 
it is also extremely difficult to approximate: for any constant e > an approximation factor 
of is not possible in polynomial time, unless P=NP, where |X| denotes the number of 

species under investigation. This holds even for tree-child time-consistent networks (Theorem 
[4]). This shows that the trivial approximation factor of \X\ is in a certain sense the best that is 
possible in polynomial time. For binary networks we show a slightly weaker inapproximability 
threshold: |X|3~ e (Theorem 5J). Both the hardness and inapproximability results described 
here for the softwired model are stronger than the result given in [18], which shows APX- 
hardness on nonbinary networks. We note that the hardness results in [30J are not directly 
comparable, since they adopt the recombination network model of phylogenetic networks (see 
e.g. [HI HE] and also the discussion in [T8]). 

While the hardwired parsimony score can be shown to be fixed parameter tractable with the 
parsimony score as parameter (Section [3j Corollary [3]) , the softwired parsimony score is not 
fixed parameter tractable with the parsimony score as parameter, unless P=NP. (See [1'2\ 131 j 
for an introduction to fixed parameter tractability). Indeed, it is even NP-hard to determine 
whether the softwired parsimony score is 1. However, we show in Section [4]- Theorem [6]- 
that the softwired parsimony score is fixed parameter tractable in the level of the network, 
where the level of a network is the maximum amount of reticulate activity in a biconnected 
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component of the network (see e.g. [244123] for an overview). Moreover, in Section[6]we present 
an Integer Linear Program to calculate both the softwired and hardwired parsimony score of 
a given character (or, more generally, multiple sequence alignment) on a phylogenetic network 
and give a preliminary analysis of its performance. This is the first practical exact method for 
computation of parsimony on medium to large networks, supplementing the heuristics given 
in |18} 12 lj . An implementation of this program is freely available |10| . 

Finally, in Section [7j we summarize our results and state some open problems for future 



Let X be a finite set. An unrooted phylogenetic network on X is a connected, undirected 
graph that has no degree-2 nodes and that has its degree-1 nodes (the leaves) bijectively 
labelled by the elements of X. A rooted phylogenetic network on X is a directed acyclic 
graph that has a single indegree-0 node (the root), no indegree-1 outdegree-1 nodes, and its 
outdegree-0 nodes (the leaves) bijectively labelled by the elements of X. We identify each leaf 
with its label. The indegree of a node v of a rooted phylogenetic network is denoted 6~(v) 
and v is said to be a reticulation node if 6~(v) > 2. An edge (u, v) is called a reticulation 
edge if v is a reticulation node and it is called a tree edge otherwise. A proper subset C C X 
is referred to as a cluster of X. 

When we refer to a phylogenetic network, it can be either rooted or unrooted. We use 
V(N) and E(N) to denote, respectively, the node and edge set of a phylogenetic network N. 
To simplify notation, we use the notation (u, v) for a directed as well as for an undirected 
edge between u and v. A phylogenetic network is binary if each node has total degree at 
most 3 and (in case of a rooted network) the root has outdegree 2 and all reticulations have 
out degree 1. 

The reticulation number of a phylogenetic network N can be defined as \E(N)\ — \V{N)\ + 1. 
Hence, the reticulation number of a rooted binary network is simply the number of reticulation 
nodes. A rooted phylogenetic tree is a rooted phylogenetic network with no reticulation nodes, 
i.e. with reticulation number 0. A biconnected component of a phylogenetic network is a 
maximal biconnected subgraph. A phylogenetic network is said to be a level-k network if 
each biconnected component has reticulation number at most k, and at least one biconnected 
component has reticulation number exactly k. 

A rooted phylogenetic network N is said to be tree- child if each non-leaf node has a child 
that is not a reticulation and N is said to be time- consistent if there exists a "time-stamp" 
function t : V(N) — >■ N such that for each edge (u, v) holds that t(u) = t(v) if v is a reticulation 
and t(u) < t(v) otherwise [3]. 

If F is a finite set and p G N, then a p-state character on F is a function from F to {1, . . . ,p}. 
A p-state character is binary if p = 2. Let a be a p-state character on X and N a phylogenetic 
network on X. Then, a p-state character r on V(N) is an extension of a to V(N) if t(x) = 
a(x) for all x £ X. Given a p-state character r on V(N) and an edge e = (u, v) of N, the 
change c T (e) on edge e w.r.t. r is defined as: 



The hardwired parsimony score of a phylogenetic network N and a p-state character a 
on X can be defined as: 



research. 



2. Preliminaries 
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if t(u) = t{v) 

1 if t(u) 7^ t(v). 
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where the minimum is taken over all extensions r of a to V(N). 

Now, consider a phylogenetic network N on X and a phylogenetic tree T on X, where 
either both N and T are rooted or both are not. We say that T is displayed by N if T can be 
obtained from a subgraph of N by suppressing non-root nodes with total degree 2. For a rooted 
phylogenetic network iV, a switching of iV is obtained by, for each reticulation node, deleting 
all but one of its incoming edges [23|. We denote the set of switchings of N by S(N). It can 
easily be seen that T is displayed by N if and only if T can be obtained from a switching 
of N by deleting indegree-0 outdegree-1 nodes, deleting unlabelled outdegree-0 nodes and 
suppressing indegree-1 outdegree-1 nodes. Let T(N) denote the set of all phylogenetic trees 
on X that are displayed by N. The softwired parsimony score of a phylogenetic network N 
and a p-state character a on X can be defined as: 



where the second minimum is taken over all extensions r of a to V(T). 

Note that both the hardwired and softwired parsimony score can be used for rooted as well 
as for unrooted networks although the softwired parsimony score might seem more relevant 
for rooted networks and the hardwired parsimony score for unrooted ones. 

It can easily be seen that, if N is a tree, PSh w (N, a) = PS SW (N, a). However, we now 
show that, if N is a network, the difference between the two can be arbitrarily large. 

Figure [T] presents an example of a rooted binary phylogenetic network N and a binary 
character a, where PS SW (N, a) = 2 regardless of the number of reticulation nodes in N. This 
is due to the fact that all right-hand side parental edges of the reticulation nodes could be 
switched off, such that only one change from to 1 would be required in the resulting tree 
on the edge just above all taxa labelled "1" and another change from 1 to in the (0,1)- 
cherry. However, PS\ 1W (N, a) can be made arbitrarily large by extending the construction 
in the expected fashion, as in this network we have PSh_ w (N,a) = r + 1, where r denotes 
the number of reticulation nodes in N. So the difference PSh w (N, a) — PS SW (N, a) equals 
r — 1, where r can be made arbitrarily large, which shows that PS^ w (N,a) is not an o(n)- 
approximation of PS sw (N,a), where n is the number of taxa. Note that the construction 
shown in Figure [T] is binary, tree-child and time-consistent. 

2.1. Extended Fitch Algorithm. Kannan and Wheeler introduced the hardwired parsi- 
mony score for rooted networks and proposed a heuristic by extending the well-known Fitch 
algorithm for trees. We will call this algorithm ExtendedFitch. We show in this section 
that ExtendedFitch does not compute the hardwired parsimony score optimally and does 
not provide a good approximation for the softwired parsimony score. 

The example from Figure [T] can be used again, in order to show that ExtendedFitch 
does not compute an o(n)-approximation of the softwired parsimony score. Indeed, Extend- 
edFitch gives all internal nodes character state 0, leading to a total score of r + 1 = (n + 1)/3 
(which is in this case indeed equal to the hardwired parsimony score), while we already showed 
that PS SW (N, a) = 2. Hence, the approximation ratio of ExtendedFitch is at least (r+l)/2 
as a function of r and at least (n + l)/6 as a function of n. 
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Figure 1. Example of a rooted phylogenetic network N and binary char- 
acter a for which the difference between PSi m (N,a) and PS sw (N,a) is ar- 
bitrarily large (r — 1 if the network is extended to have r reticulations and 
n = 3r + 2 leaves) and for which ExtendedFitch does not compute an 
o(n)-approximation of PS SW (N, a). 

Note that Theorem [4] is furthermore complexity-theoretic evidence that ExtendedFitch, 
a polynomial-time algorithm, cannot approximate the softwired parsimony score of a network 
well (unless P=NP). 

Next we show that ExtendedFitch does not compute PS\ 1W (N, a) optimally, even if a is 
a binary character. Consider Figure [2} This figure displays a rooted phylogenetic network N 
with three leaves, one of which is directly connected to the only reticulation node. The network 
is again binary, tree-child and time-consistent. ExtendedFitch fixes the reticulation node 
to be in state 0, whereas all other internal nodes can be either or 1. The two optimal 
solutions found by ExtendedFitch are illustrated by Figure [2j they either uniformly set all 
internal nodes other than the reticulation node to state or to state 1. Thus, the resulting 
score is 2, as either both pending edges leading to the leaves labelled 1 need a change or 
both edges leading to the reticulation node. The most parsimonious solution, however, would 
be to set all internal nodes - including the reticulation node - to state 1. This way, only 
one change on the edge from the reticulation node to its pending leaf would be required. 
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Figure 2. Example of a rooted phylogenetic network N and binary character 
a for which ExtendedFitch does not provide the optimal parsimony score 
PS\ 1W (N, a). The small numbers refer to the two possible internal labellings 
as suggested by ExtendedFitch. Both require two changes on the marked 
edges. However, the optimal parsimony score is 1: If all internal nodes are 
labelled 1, then only one change is needed on the edge from the reticulation 
node to the leaf labelled 0. 

Therefore, ExtendedFitch cannot be used to calculate the hardwired parsimony score of a 
character on a network. 

2.2. A comment on model differences. Our core definition of phylogenetic network is 
slightly less restricted than the definitions given in |18|, 12 lj . However, the strong hardness 
and inapproximability results we give in this article still hold under heavy topological and 
biological restrictions (degree restrictions, tree-child, time-consistent) that are often subsumed 
into the core definitions given in other articles. Moreover, an obvious advantage of our 
definition is that all the positive results in the article apply to the largest possible class of 
phylogenetic networks. 

3. Computing the hardwired parsimony score of a phylogenetic network 

Given an undirected graph G and a set T of nodes of G called terminals, a multiterminal 
cut of (G, r) is a subset E' of the edges of G such that each terminal is in a different con- 
nected component of the graph obtained from G by removing the edges of E' . A minimum 
multiterminal cut is a multiterminal cut of minimum size. The following theorem shows that 
computing the hardwired parsimony score of a phylogenetic network is at most as hard as 
Multiterminal Cut, the problem of finding a minimum multiterminal cut. 

Theorem 1. Let N be a phylogenetic network on X and a a p-state character on X. Let G 
be the graph obtained from N by merging all leaves x with a(x) = i into a single node 7i, 
for i = 1, . . . ,p. Then, the size of a minimum multiterminal cut of (G, {71, . . . , 7fc}) is equal 
toPS hw (N,a). 

Proof First consider an extension r of a to V(N) for which PSh w (N, a) = YleeEfN) c r( e ) 
(i.e. an optimal extension). Let E' be the set of edges e with c r (e) = 1. Since r(ji) = i for 
i = 1, . . . ,p, any path from ji to 77 with i ^ j contains at least one edge of E' . Hence, E' is 
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a multiterminal cut. Moreover, PSh w (N, a) = J2 e &E(N) c r( e ) = l-^'l- Hence, PSh w (N, a) is 
greater than or equal to the size of a minimum multiterminal cut. 

Now consider a minimum multiterminal cut E' of G and let G' be the result of removing 
the edges in E' from G. We define an extension r of a to V(N) as follows. First, we set 
t(x) = a(x), for all x £ X. Then, for each node v that is in the same connected component 
of G' as 7i, set t(v) = i. Finally, for each remaining node, set t(v) = p. Then, each 
edge e ^ E' has c T (e) = 0. Consequently, each edge e with c T (e) = 1 is in E' . Hence, 
PSh w (N, a) < ^2 eG E(N) c r( e ) < l-^'l- It follows that PShw(X, a) is less or equal to the size of 
a minimum multiterminal cut, which concludes the proof. □ 

Corollary 1. Computing the hardwired parsimony score of a phylogenetic network and a 
binary character is polynomial-time solvable. 

Proof. This follows directly from Theorem [T] because, in the case of two terminals, Multi- 
terminal Cut becomes the classical minimum s — i-cut problem, which is polynomial-time 
solvable. □ 

Corollary 2. Computing the hardwired parsimony score of a phylogenetic network and a 
p-state character, for p > 3, is NP-hard and APX-hard. 

Proof. We reduce from Multiterminal Cut, which is NP-hard and APX-hard for three or 
more terminals [7]. 

Let G be an undirected graph and T = {71, . . . , 7^} a set of terminals. Note that feasible 
solutions to Multiterminal Cut must contain all edges between adjacent terminals. For 
this reason we begin by removing such edges from G. Next we repeatedly delete all degree-1 
nodes that are not terminals, until no such nodes are left, because the edges adjacent to such 
nodes cannot contribute to a multiterminal cut. 

We construct a finite set X and a fc-state character a on X as follows. For each terminal 7^, 
and for each node v\ adjacent to 7, in G, put an element x\ in X and set Oi{x\) = i. Now we 
construct a new graph N from G by deleting each 7^ and adding a leaf labelled x\ with an edge 
(v^x?), for each x\ £ X. Now, ./V might contain degree-2 nodes, which are not permitted in 
our definition of phylogenetic network, but as we explain in the Appendix there is a simple 
transformation that removes such nodes without altering the hardwired parsimony score or 
the cut properties of the graph. We apply this transformation to N if necessary. Suppose 
then that the resulting graph N is connected, and hence an unrooted phylogenetic network 
on X. Then it follows from Theorem [T] that the size of a minimum multiterminal cut of (G, T) 
is equal to PSy iw (N, a). 

Now suppose that N is not connected. Observe that the proof of Theorem [T] still holds 
if N is not connected. Moreover, computing the hardwired parsimony score of a connected 
unrooted phylogenetic network is at least as hard as computing the hardwired parsimony score 
of a not-necessarily connected phylogenetic network, because we can sum the parsimony scores 
of the connected components. This reduction is clearly approximation-preserving. Finally we 
note that computing the hardwired parsimony score of a rooted phylogenetic network is just 
as hard as computing this score of an unrooted phylogenetic network because the hardwired 
parsimony score does not depend on the orientation of the edges. □ 

Corollary 3. Computing the hardwired parsimony score of a phylogenetic network and a p- 
state character is fixed-parameter tractable (FPT) in the parsimony score. Moreover, there 
exists a polynomial-time 1.34-38- approximation for all p and a ^-approximation for p = 3. 
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Proof. The approximation results follow from the corresponding results on minimum multi- 
terminal cut [22] by Theorem [T] 

For the fixed-parameter tractability, we use the corresponding result on the problem Mul- 
ticut, which is defined as follows. Given a graph G and q terminal pairs 71), ... , (7^, 7^), 
find a minimum-size subset E' of the edges of G such that there is no path from 7^ to ji 
for, i = 1, . . . ,q, in the graph obtained from G by removing the edges of E' . Clearly, Mul- 
titerminal Cut can be reduced to Multicut by creating a terminal pair (7, 7') for each 
combination of two terminals 7,7' £ T with 7 7^ 7'. Hence, since Multicut is fixed-parameter 
tractable in the size of the cut |26} [6] , it follows by Theorem [T] that computing the hardwired 
parsimony score of a phylogenetic network and a p-state character is fixed-parameter tractable 
in the parsimony score. □ 

4. Computing the softwired parsimony score of a rooted phylogenetic 

NETWORK 

In the following, we show that computing the softwired parsimony score of a binary char- 
acter on a binary rooted phylogenetic network is NP-hard. We reduce from Cluster Con- 
tainment, which is known to be NP-hard for general networks [20} 117]. However, in order 
to prove our result for binary networks, we first need to show that Cluster Containment 
is NP-hard for binary phylogenetic networks, too; this intermediate result has to the best of 
our knowledge not appeared earlier in the literature. We do this via Edge Cluster Con- 
tainment and Binary Edge Cluster Containment as described in the following. Thus, 
we state the following questions and analyze their complexity in the subsequent lemmas. 

(Binary) Edge Cluster Containment 
Instance: A set X of taxa, a rooted (binary) phylogenetic network N (with edge set E and 
node set V ) on X, a cluster C C X and an edge e = (u, v) G E. 

Question: Is there a rooted phylogenetic tree T displayed by N and rooted at node v such 
that the taxa descending from v are precisely the taxa in C? 

Note that if the answer is yes to the question above, we say that e represents C. We denote 
by C(N) the set of clusters represented by edges in E. 

(Binary) Cluster Containment 
Instance: A set X of taxa, a rooted (binary) phylogenetic network N (with edge set E and 
node set V) on X and a cluster C C X . 

Question: Is there a rooted phylogenetic tree T displayed by N which contains an edge 
e = (u, v) 6 E such that the taxa descending from v are precisely the taxa in C? 

We now state the following lemma. 
Lemma 1. Edge Cluster Containment is NP-hard. 

Proof. We give a (Turing) reduction from Cluster Containment. Assume there is an 
algorithm A to decide Edge Cluster Containment in polynomial time and let N be a 
phylogenetic network on X. Then one can apply A to all edges in E of N (at a cost of \E\ 
times the complexity of A, which is polynomial in the input size by assumption) to see if N 
contains an edge e which represents C. By definition, N represents C if and only if N contains 
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Figure 3. Illustration of a rooted phylogenetic network N with a node v of 
total degree 6 and a possible binary refinement N B of N, where three copies 
of v, namely v' , v" and v'", as well as three new edges (dashed lines) are 
inserted. 

an edge e which represents C. So this method would provide a polynomial-time algorithm to 
solve Cluster Containment. □ 

Next we use the previous lemma to prove the following result. 

Lemma 2. Binary Edge Cluster Containment is NP-hard. 

Proof. We reduce from Edge Cluster Containment. Assume there is an algorithm A to 
decide Binary Edge Cluster Containment in polynomial time. Let N be a phylogenetic 
network on X containing an edge e. We want to know if e represents a particular cluster 
C. Let N B be an arbitrary binary refinement of N. Note that N B contains all edges of N, 
and possibly some more (unless N is already binary), in the sense that edges in N B could 
be contracted to once again obtain N. Hence, e is contained in N , too. An example of a 
binary refinement of a nonbinary network is depicted in Figure [3| So we can use A to decide 
if e represents C in N B . Note that e represents C in N B if and only if e represents C in N, 
because it is easy to see that refining a network does not change the clusters pending on a 
particular edge. Therefore, this method would provide a polynomial-time algorithm to solve 
Edge Cluster Containment. 

□ 

Now we are in a position to prove that Binary Cluster Containment is NP-hard, which 
is the essential ingredient to our proof of Theorem [2] 

Lemma 3. Binary Cluster Containment is NP-hard. 

Proof. We reduce from Binary Edge Cluster Containment. Let N B be a rooted binary 
phylogenetic network on X and C a cluster of X. Assume there is an algorithm A to answer 
Binary Cluster Containment in polynomial time. Let e = (v,u) be an edge in N B . 
We add two new nodes v\, t>2 to N B e as follows: Subdivide e into three edges e\ := (v,vi), 
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Figure 4. Illustration of the modifications applied to N B as depicted by 
Figure [3J resulting in the modified binary network N B . 

ei := (v\, V2) and := (v%, u). Now introduce two new edges := (v 1, /12) and e§ := (i>2, hi), 
where h\ and /12 are two new taxa. We call the resulting modified network N B . An example 
of this transformation is depicted in Figure |4| 

Note that by construction, N B is binary. We now use algorithm A to decide in polynomial 
time if N B contains the cluster C U h\. Note that this is the case if and only if e2 in 
N B represents C, which, by construction, is the case if and only if e represents C in N B . 
Therefore, this method would provide a polynomial-time algorithm to solve Binary Edge 
Cluster Containment. 

□ 

The following theorem was shown by Jin et al. |18j for nonbinary networks. The advantage 
of the proof given below is that it shows that the problem is even NP-hard for binary networks, 
demonstrates a direct and insightful relationship between cluster containment and parsimony, 
and leads directly to the conclusion that the problem is not even fixed-parameter tractable 
(unless P=NP). 

Theorem 2. Computing the softwired parsimony score of a binary character on a binary 
rooted phylogenetic network is NP-hard. 

Proof. We reduce from Binary Cluster Containment. Let N be a rooted binary phylo- 
genetic network on taxon set X and let C C X be a cluster. Then, by definition of C(N), C 
is in C(N) if and only if there is a tree T displayed by N with an edge e = (u, v) such that the 
taxa descending from v are precisely the elements of C. This is the case if and only if v is the 
root of a subtree of T with leaf set C. Now assume that there is an algorithm A to compute 
the softwired parsimony score of a binary character on a rooted binary phylogenetic network 
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in polynomial time. Then, we can solve Binary Cluster Containment by the following 
algorithm A: 

(1) Introduce a modified version N of N as follows: Add an additional taxon z to N and 
a new node p as well as the edges (p, z) and (p, p), where p is the root of N. Thus, 
the taxon set X of N is X U {z} and the root of N is p. 

(2) Construct a binary character a on X as follows: 

'l ifxeC 
if x € X \ C. 

Note that a(z) = as z $ X and thus z $ C. 

(3) Calculate the parsimony score PS SW (N, a) using algorithm .A. 

Note that PS SW (N, a) = 1 if and only if iV displays a tree T which has a subtree with label 
set C. This is due to the fact that, as a(z) = 0, the softwired parsimony score of N can only 
be 1 if p and p receive state 0. Otherwise, there would be a change required on one of the 
edges (p, z) or (p, p) and additionally at least one more change in the part of N corresponding 
to N, as X employs both states 1 and for taxa in or not in C, respectively, because C^X. 
Moreover, if p is in state 0, the softwired parsimony score of N is 1 precisely if ./V displays a 
tree T which only requires one change, and that change has to be a change from to 1. This 
is the case if and only if N displays a tree T with a subtree with leaf labels C. This case 
is illustrated by Figure [HJ Note that if A is polynomial, so is A. Therefore, computing the 
softwired parsimony score of a binary character on a binary rooted phylogenetic network is 
NP-hard. □ 

We can extend the NP-hardness result to a more restricted class of rooted phylogenetic 
networks. 

Theorem 3. Computing the softwired parsimony score of a binary tree-child time- consistent 
rooted phylogenetic network and a binary character is NP-hard. 

Proof. We can make any network tree-child and time-consistent by hanging a cherry in the 
middle of each reticulation edge. If we give the two leaves of each cherry character states 
and 1, then the softwired parsimony score is increased exactly by the number of added cherries. 

□ 

Corollary 4. It is NP-hard to decide if the softwired parsimony score of a binary rooted 
time- consistent phylogenetic network and a binary character is equal to one. In particular, 
there is no fixed-parameter tractable algorithm with the parsimony score as parameter unless 
P = NP. 

Proof. In [T7| . it has been proven that Cluster Containment is NP-hard even for time- 
consistent networks by reducing from Cluster Containment on general networks. Since 
the proof in |17j transforms the input network in a way that preserves binarity, the corollary 
follows directly from the combination of the result in [IT] and Theorem [2] □ 

Before proceeding to the question of approximability, we require some new auxiliary defini- 
tions. Recall that S(N) is the set of all switchings of a network. Given a rooted phylogenetic 
network N, we define PSs(N,a) as: 

min min > c T (e) 
seS(N) r *-f 
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Figure 5. Illustration of the extension of a rooted binary phylogenetic net- 
work N (solid lines) to the rooted binary phylogenetic network N as described 
in the proof of Theorem [2j The additional taxon z is assigned state along 
with all taxa in X \ C, whereas all taxa in C are assigned state 1. Then, the 
softwired parsimony score of N is 1 if and only if N displays a tree T with a 
pending subtree with leaf set C. 



where the second minimum is taken over all extensions r of a to V(S). 

The following result shows that optimal solutions can equivalently be modelled as selecting 
the lowest-score switching, ranging over all extensions r of a character a to the nodes of 
the network. This is the characterisation of optimality used in Section [6] and enables us to 
circumvent some of the suppression and deletion technicalities associated with the concept 
"display" . 

Lemma 4. Consider a rooted phylogenetic network N on X and a p-state character a on X . 
Then 

PS s {N,a) = PS sw {N : a). 



Proof. Let S be a switching of ./V and r an extension of a to V(S) - or equivalently to 
V(N) - such that ^ee-E(s) c r( e ) = PSs{X, a). Let T be the tree obtained from S by deleting 
indegree-0 outdegree-1 nodes, deleting unlabelled outdegree-0 nodes and suppressing indegree- 
1 outdegree-1 nodes and let r' be the restriction of r to the nodes of S still present in T. 
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Figure 6. An example of an edge (u, v) of T that has been mapped to several 
edges in the switching underlying T, used in the proof of Lemma [4} 

By construction we have that, since S is a switching of N on X, and T has been obtained 
from S as described above, then T G T(N). Moreover, since r is an extension of a to V(S), 
we have that r' is a an extension of a to V(T). Finally, it is easy to see that PSs(N,a) = 
See-B(5) c r( e ) > YleeE(T) c r'( e ) > PS sw (N,a), since suppressing nodes (and consequently 
edges) cannot increase the sum of changes on the remaining edges. 

Now, let T be a tree of T(N) and r an extension of a to V(T) such that X] e eE(T) c r( e ) = 
PS SW (N, a). Moreover, let S be a switching corresponding to T, i.e. such that T can be 
obtained from S by deleting indegree-0 outdegree-1 nodes, deleting unlabelled outdegree-0 
nodes and suppressing indegree-1 outdegree-1 nodes. We know that such a switching exists 
because T G T(N). Now, let r' : V(S) -> such that r» = r(«) if u G V(T) 

(i.e. u is the image in N of a node of T) and t'(u) = {?} otherwise. A value in {1, ...,p} is 
associated to all nodes u of S having t'(u) = {?} in the following way: We start by setting 
t' (root(S)) to r(root(T)) and then we traverse S in preorder, setting t'(u) to r'(u p ) for all 
nodes u having t'(u) = {?}, where u p is the parent node of u. 

First note that the root of T corresponds to the node p of S that is closest to the root with 
the following property: p has out-degree 2 or higher, and nodes not labelled ? can be reached 
from at least two children of p. 

Then we have that all edges of S not reachable from p cost 0, since for all these edges (it, v) 
we have t'(u) = t'(v) = r'(root(T)). Now, let e = (u, v) be an edge of T. In S this edge will 
often correspond to a set of edges, denoted by E$(e), see Figure |6j Now, note that the value 
of r'(-) is equal to r(u) for all descendants of u in S that cannot be reached via v. Then it is 
easy to see that the cost of all edges in Eg(e) equals c T i(w,v) = c T (u,v) = c T (e), where w is 
the parent node of v in S. Since this holds for all edges of T, and r' is clearly an extension 
of a to V(S), we have that PS SW (N, a) = YleeE(T) c r( e ) = See_E(s) c r'( e ) ^ PSs(N, a). This 
concludes the proof. □ 

The following straightforward corollary will be useful when describing approximation- 
preserving reductions. 
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Corollary 5. Given a network N and a character a on X , a tree T G T(N), a switching 
S G S(N) corresponding to T and an extension r of a to V(T), we can construct in polynomial 
time an extension r' of a to V(N) such that J2 e &E(S) <v( e ) = Yl e &E{T) c r( e )- 

We now show that it is very hard to approximate the softwired parsimony score on rooted 
networks, even in the case of binary characters. The first inapproximability result applies to 
binary networks. The second, stronger inapproximability result applies to general networks 
and holds even when the network is both tree-child and time-consistent. This shows that 
in a complexity-theoretic sense trivial approximation algorithms are the best one can hope 
for. Both results are much stronger than the APX-hardness result presented in [18J. At the 
present time we do not have an inapproximability result for networks that are simultaneously 
binary, tree-child and time-consistent: in this sense Theorem [3] is currently the strongest 
hardness result we have for such networks. 

Before proceeding we formally define the output of an algorithm that approximates PS SW (N, a) 
as a tree T G T(N) and a certificate that T G T(N) i.e. a switching S G S(N) corresponding 
to T. The certificate is necessary because it is NP-hard to determine whether a tree is dis- 
played by a network [20]. The parsimony score (i.e. value of the objective function) associated 
with the output T is then 

PS(T, a) = min c T (e) 

ee£(T) 

where the minimum is taken over all extensions r of a to V(T). Note that PS(T, a) can easily 
be computed in polynomial time by applying Fitch's algorithm to T. If necessary Corollary [5] 
can then be applied to transform this in polynomial time into an extension of a to V(N) 
such that the switching corresponding to T, i.e. our certificate, has parsimony score at most 
PS(T,a). 

Consider the following simple observation. 

Observation 1. The softwired parsimony score of a rooted phylogenetic network N on X 
and a p-state character a on X can be (trivially) approximated in polynomial time with ap- 
proximation factor \X\, for any p > 2. 

Proof. Let s G {1, . . . ,p} be the state to which at least a fraction 1/p of X is mapped by a. 
Let T be an arbitrary tree in T(N). We extend a to V(T) by labelling all internal nodes 
of T with s. Clearly, PS sw (N,a) = if and only if a maps all elements in X to the same 
character state, in which case the extension of a to V(T) also yields a parsimony score of 0. 
Otherwise, PS sw (N,a) > 1 and the extension described yields a parsimony score of at most 
(1 — < \X\, from which the result follows. □ 

The following theorem shows that, in an asymptotic sense, Observation [I] is actually the 
best approximation possible, even when the topology of the network is quite heavily restricted. 

Theorem 4. For every constant e > there is no polynomial-time approximation algorithm 
that can approximate PS sw {N,a) to a factor where N is a tree-child, time- consistent 

network and a is a binary character on X, unless P = NP . 

Proof. We reduce from the NP-hard decision problem 3-SAT. This is the problem of deter- 
mining whether a boolean formula in CNF form, where each clause contains at most 3 literals, 
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is satisfiable. Let B = (V, C) be an instance of 3-SAT, where V is the set of variables and 
C is the set of clauses. Let |V| = n. Observe that \C\ = m is at most 0(n 3 ) because in a 
decision problem it makes no sense to include repeated clauses. 

For each constant e > 0, we will show how to construct a parsimony instance (N, a) such 
that the existence of a polynomial-time |X| 1_e approximation would allow us to determine 
in polynomial time whether B is a YES or a NO instance, from which the theorem will 
follow. The construction can be thought of as an "inapproximability" variant of the hardness 
construction used by Kanj et al in [20] . 

Throughout the proof we will make heavy use of the equivalence described in Lemma |4j 
Specifically, we will characterise optimal solutions to the softwired parsimony problem as the 
score yielded by the lowest-score switching, ranging over all extensions of a to V(N). 

We begin by proving the result for networks that are time-consistent, but not tree-child. 
Later we will show how to extend the result to networks that are time-consistent and tree- 
child. 

The centrepiece of the construction is the following variable gadget. Let z be a variable 
in V . We introduce two nodes which we refer to as z and —>z, and name them collectively 
connector nodes. We introduce two sets of taxa, X z $ and X Zt i, each containing f(n, e) taxa, 
where f(n, e) is a function that we will specify later. For each taxon x £ X z ^ we set a(x) = i. 
By introducing 2 • f(n,e) reticulation nodes we connect each taxon in X z $ and X Z) \ to both 
z and —>z (see Figure [7|). Observe that if both z and —>z are labelled with the same character 
state, the parsimony score of this gadget (and thus of the network as a whole) will be at least 
f(n, e). On the other hand, if z and ->z are labelled with different character states, the gadget 
contributes (locally) zero to the parsimony score. The idea is thus that we label (z, -<z) with 
(1, 0) if we wish to set variable z to be TRUE, and (0, 1) if we wish z to be FALSE i.e. —>z is 
TRUE. By choosing f(n, e) to be very large we will ensure that z and -*z are never labelled 
with the same character state in "good" solutions. 

We construct one variable gadget for each z G V. Next we add the root p and two nodes so 
and si. We connect p to so and to s\. Next, we connect sq (respectively, si) to every connector 
node (ranging over all variable gadgets). Hence, every connector node has indegree 2. The 
idea is that (without loss of generality) so (respectively, si) can be assumed to be labelled 
(respectively, 1). Therefore, if a connector node is labelled with state (respectively, 1), 
it will choose so (respectively, s{) to be its parent, and these edges will not contribute any 
mutations to the parsimony score. There are two points to note here. Firstly, there will be 
exactly one mutation incurred on the two edges (p, so) and (p, si), and this has an important 
role in the ensuing inapproximability argument; we shall return to this later. Secondly, it 
could happen that the labelling of so and sj is (1,0) rather than (0, 1), but in that case the 
analysis is entirely symmetrical. 

It remains only to describe the clause gadgets. These are very simple. For each clause 
c € C we introduce a size f(n, e) set of taxa that we call X c . For each taxon x S X c we set 
a(x) = 1. By introducing f(n, e) nodes - these will be reticulations, unless the clause contains 
only one literal - we connect each taxon in X c to the connector nodes in the variable gadgets 
corresponding to the literals in the clause. For example, if c is the clause {—>x V y V z), each 
node in X c has -*x, y and z as its parents. Similarly to the variable gadgets, observe that if 
none of the literals corresponding to c are labelled 1 (i.e. set to TRUE), the clause gadget 
corresponding to c will raise the parsimony score by at least f(n,e), but if that at least one 
literal is TRUE, the (local) parsimony cost will be zero. 
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P 




Figure 7. An encoding of the 3-SAT instance [x V ->y) A (px V y V z) as 
described in Theorem |4j Note that the network is time-consistent (the root is 
allocated time-stamp 1 and all other nodes are allocated time-stamp 2), but 
not tree-child: a slight modification is required to make it tree-child. 



Observe firstly that PS SW (N, a) > 1 because both character states appear in the range of 
a. More fundamentally, PN SW (N, a) = 1 if B is satisfiable - in which case the single mutation 
occurs on one of the edges (p, sq) and (p, s\) - and PN SW (N, a) > f(n, e) if B is unsatisfiable. 
This dichotomy holds because, in order to have PS SW (N, a) < f(n, e), it is necessary that the 
connector nodes in the variable gadgets always have a labelling of the form (0, 1) or (1, 0), and 
that for every clause c the nodes in X c all have at least one TRUE parent i.e. B is satisfiable. 

The high-level idea is to choose f(n,e) to be so large that even a weak approximation 
factor will be sufficient to determine without error whether B is satisfiable or unsatisfiable. 
Before explaining how to choose f(n,e) we formally describe the steps in the reduction. 
Let T £ T(N) be the tree produced by the approximation algorithm for PS sw (N,a), and 
S G S(N) a corresponding switching. We compute PS(T, a) and let r be any extension of a 
to V(T) that achieves parsimony score PS(T, a); this can all be done in polynomial time using 
Fitch's algorithm. If PS(T, a) > /(n, e) we declare that the SAT instance B is unsatisfiable. 
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Otherwise, we declare that B is satisfiable. Note that, by Corollary [5j we can use T, S and r 
to obtain in polynomial time an extension r' of a to V(N) such that the parsimony score of S 
under r' is also strictly less than /(n, e). Therefore, for each variable z in the SAT instance, 
t' has to label z and —^z with different character states, and for each clause c G C in the SAT 
instance, at least one of its literals has to be labelled with character state 1. The satisfying 
assignment is thus: for each variable z, z is TRUE if z is labelled 1, and FALSE if —iz is 
labelled 1. 

We now show how to choose f(n,e) such that |A| 1_<E • 1 < f(n,e). When chosen this 
way, an approximation algorithm with approximation factor |X| 1_e will be forced to return 
a solution that, as we have just described, can be transformed into a satisfying assignment of 
B, whenever that is possible. This will be the only option, because returning a solution with 
parsimony score f(n, e) or higher will be more than a factor |X| 1_e larger than the optimum, 
which is 1 in the case of satisfiability. Now, observe that \X\ = 2n ■ f(n, e) + m ■ f(n, e). Given 
the relationship between n and m, a (crude) upper bound on \X\ is n 5 ■ f(n, e), for sufficiently 
large n. Hence it is sufficient to ensure f(n,e). Suppose f(n, e) = n 9 ' e \ 

where g(e) is a function that only depends on e. Then we need <?(e)(l — e) + 5(1 — e) < g(e), 
which implies that taking g(e) = [~6e _1 (l — e)] is sufficient. 

The network we constructed above is time-consistent (see Figure "i ) but not tree-child: 
potentially only the root p has at least one child that is not a reticulation. We can transform 
the network as follows. For each node v with indegree greater than 1 and outdegree we 
simply add an outgoing edge to a new node v' , where v' receives time-stamp 3, and a{v') 
takes over the character state a{v). Next we introduce 2n + 2 new taxa. For Sj, i G {0, 1}, we 
introduce a new node (with time-stamp 3), add an edge (sj, s^) ; and set a(s^) = i. For each 
variable z in the SAT instance B, we introduce two new taxa z' and —>z' (both of which receive 
time-stamp 3), add edges (z, z') and (-iz, ->z') and set a(z') = a(^z') = 0. The network is now 
both tree-child and time-consistent. Now, observe that the two taxa introduced underneath 
so and s\ do not change the optimum parsimony score, because without loss of generality 
we can assume that sq is labelled and s\ is labelled 1. However, for each variable z, some 
extra mutations might be incurred on the edges (z,z') and (->z,-iz'). As long as f(n,e) is 
chosen to be large enough, these (at most) 2n extra mutations do not significantly alter the 
reduction: in optimal solutions each (z, -\z) pair will still be labelled with different character 
states, reflecting a satisfying truth assignment for B, whenever B is satisfiable. In fact, if 
B is satisfiable, then exactly n extra mutations will be incurred on the edges (z, z') and 
(—>z, —>z') (in an optimal solution), since at least one of z and —<z will be labelled 0. So, if B 
is satisfiable, PS sw (N,a)=n + 1 and if B is unsatisfiable, PS sw (N,a) > f(n,e). As long as 
we choose |X| 1_e (n + 1) < /(n,e), any |X| 1_e -approximation will be forced to return a tree 
T such that PS(T,a) < f(n,e) whenever B is satisfiable, and this can be transformed in the 
same way as before in polynomial time into a satisfying assignment for B. 

Hence we need to choose f(n, e) such that |X| 1_e (n + 1) < f(n,e), where this time \X\ = 
2n ■ f(n, e) + m ■ f(n, e) + (2n + 2). As before, \X\ < n 5 • fin, e) holds for sufficiently large 
n, as does n + 1 < n 2 . So establishing f(n, e) > n( 7_5<E )/(n, e) 1 ~ e would be sufficient. Letting 
f(n, e) = n 9 ^ and taking logarithms, it is sufficient to choose g{e) such that g(e) > 7 — 5e + 
<?(e)(l — e). Taking g(e) = \^ J= ^ 1 ]+1 is sufficient for this purpose, and we are done. □ 

For binary networks, we get a slightly weaker inapproximability result. 

possible time-stamp allocates 1 to the root, 2 to all reticulation nodes (which includes So and Si) and 3 
to the nodes in any single-literal clause gadgets, as these are tree nodes. 
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Theorem 5. For every constant e > there is no polynomial-time approximation algorithm 
that approximates PS sw (N,a) to a factor \X\~s~ e , where N is a rooted binary phylogenetic 
network on X and a is a binary character on X , unless P = NP. 

Proof. Deferred to the Appendix. □ 

In particular, Theorem [5] shows that there can be no 0(log(|A|))-approximation for com- 
puting the softwired parsimony score of a binary rooted phylogenetic network, unless P = NP. 
We remark that the network constructed in the proof of Theorem [5] can not easily be made 
tree-child. Hence the inapproximability of binary tree-child networks is still open. It does 
seem that the constructed network can be made time-consistent but we omit a proof. 

Although we have shown above that there is no algorithm for computing the softwired 
parsimony score that is fixed-parameter tractable in the parsimony score (unless P=NP), 
there obviously exists such an algorithm that is fixed-parameter tractable in the reticulation 
number of the network: a network with reticulation number r has 2 r switchings, and for 
each switching Fitch's algorithm can be used. Moreover, in the next section we show that 
there even exists an algorithm that is fixed-parameter tractable in the level of the network, a 
parameter potentially much smaller than reticulation number. 



5. An algorithm FOR COMPUTING THE softwired parsimony score of A NETWORK 

THAT IS FPT IN THE LEVEL OF THE NETWORK 



In Section 5.1 we describe a polynomial-time dynamic programming (DP) algorithm that 



works on rooted trees and computes a slight generalisation of the softwired parsimony score. 



In Section 5.2 we show how this can be used as a subroutine in computing the softwired 



parsimony score of networks, such that the running time is fixed-parameter tractable (FPT) 
in the level of the network. 



5.1. A DP algorithm for (not necessarily phylogenetic) rooted trees with weights. 

Let V = {1, . . . ,p} be the set of character states. Let T be a rooted tree, and let L(T) be the 
set of leaves of T. T is not necessarily a phylogenetic tree because only a subset L C L(T) 
need to be labelled, and T is allowed to have nodes with indegree and outdegree both equal 
to 1. (Later on, we will see that this allows us to model switchings). For a node v G V(T), 
let T v be the subtree of T rooted at v. 

We are given a p-state character a : L — > V . Additionally, we are given a function 
w : (V(T) xP)->N, where N = {0, 1, . . .}. 

Consider the following definition, where c T is the change function described in the prelim- 
inaries and the minimum ranges over all extensions r of a to V{T). 

(1) PS SW (T, a,w) = min M ^ c T (e) J + ( ^ w(v,t(v)) 

We can think of this as being the parsimony score with an optional added "weighting" that 
to varying degrees "penalises" nodes when they are allocated a certain character state. This 
weighting w(v, s) will be used in the next section to model the contribution to the optimum 
parsimony score of the subnetworks of N rooted at v when v is forced to be labelled with 
character state s. 
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To compute PS sw (T,a,w) we introduce the value PS SW (T, a, w, s), with s EV, where we 
add the restriction that the root of T must be labelled with character state s. Clearly, 

(2) PS sw (T,a,w) = min PS SW (T, a, w,s) 

We denote by PS SW (T, a, w, •) the vector (PS SW (T, a, w, 1), ■ • ■ , PS SW (T, a, w,p)). This vector 
is computed as described in Algorithm [TJ where 5(s,s') = if s = s' and 1 otherwise, and 
C{v) is the set of children of a non-leaf node v. Note that the optimal r can be constructed 
by backtracking, if necessary. 

Algorithm 1: Compute PS SW (T, a,w, •) 

l for each node v ofV(T) considered in post-order do 

2 

3 
4 

5 
6 

7 

8 



if v is a leaf then 
if v € L then 

PS sw (T v ,a,w,s) = w(v,s), if s = a(v) and oo otherwise; 
else 

PS sw (T v ,a, w, s) = w(v, s), for each seP: 

else 

PS SW (T V , a, w, s) — w(v, s) + 22 [ min PS sw (T v ,,a,w, s') + 5(s, s') , Vs e T 5 ; 

v'£C{v) 



9 return PS SW (T, a, w, ■); //note that T = T,, 



root(T) 



The running time of Algorithm [l] is 0(p 2 \ V(T)\). 

Lemma 5. Algorithm^ correctly computes PS SW (T V , a, w, ■), for every v G V(T). In partic- 
ular, it correctly computes PS sw (T,a,w,-). 

Proof. (Sketch) This follows from the fact that, if the state of a node v is fixed as s, then 
the only local decisions that have to be made to optimize PS SW (T V , a, w, s) are to choose the 
character state s' for each child v' of v. A change is incurred whenever s' ^ s. Once s' has 
been chosen, we are free to (and therefore should) use optimal subsolutions corresponding to 
the case when the root of subtree T v * has state s' i.e. PS SW (T V /, a, w, s'). We omit details. □ 

The following lemma shows that Algorithm [T] can be used to compute the parsimony score 
of a phylogenetic tree. 

Lemma 6. Consider a rooted tree T on X and ap-state character a on X . Then, ifw(v, s) = 
for all v £ V(T) and s £ V , then PS SW (T, a,w) = PS sw (T,a). 

Proof. This follows by combining Lemma [5] with ^ and Q. In particular, in Q the right- 
hand side of the expression degenerates to the familiar parsimony definition, because w is 
everywhere. 

□ 

5.2. Extending the DP algorithm to networks. Let be a level-fc network on X. We 
say that a biconnected component is trivial if it consists of a single edge (a cut-edge) . Thanks 
to the following results, we can envisage N as comprising non-trivial biconnected components, 
each with reticulation number at most k, arranged in a tree-like backbone. 
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Lemma 7. Let N be a rooted phylogenetic network on X and let B be a biconnected component 
of N. Then B contains exactly one node rs without ancestors in B. 

The proof of this result is deferred to the Appendix. 

Lemma 8. Let N be a rooted phylogenetic network on X. Then, if r is a reticulation, all 
incoming edges of r are in the same biconnected component of N. 

Proof The lemma can be proven by applying a similar argument to that used in the proof of 
Lemma [7j We therefore omit the proof. □ 

We define the switchings of a biconnected component B of N analogous to the definition 
of switchings of a network, i.e. a switching of a biconnected component B is a rooted tree Tb 
that can be obtained from B by deleting all but one of the incoming edges of each reticulation. 
We say that we apply switching Tb to N when deleting in N all edges of B not in Tb- The 
next result is a consequence of Lemma [Sj 

Lemma 9. Let N be a rooted phylogenetic network on X and Tjy a switching of N. Then Tjy 
can be obtained from N by, for each biconnected component B of N , first choosing a switching 
Tb and then applying it to N. 

Corollary 6. Let N be a rooted phylogenetic network on X , Tjy a switching of N and B 
a biconnected component of N. Let Tb be the switching of B induced by Tjv and let T' B 
be a different switching of B. Let T' N be the graph obtained from N by applying to N all 
switchings of all biconnected components induced by T/v except Tb, and then finally applying 
the switching T' B . Then T' N is a switching of N . 

We are now ready to describe our algorithm for computing the softwired parsimony score 
of a phylogenetic network N and p-state character a. Note that each cut-edge (u, v) is seen 
as a biconnected component with root u and only one switching. 



Algorithm 2: Compute PS sw (N,a) 

1 for each node v of N and state s in V do w(v, s) 0; 

2 for each node r of N that is a root of a least one biconnected component in post-order do 

3 for each biconnected component B r rooted at r do 

4 for each switching T r of B r do 

5 compute PS sw (T r , a, w, •) using Algorithm [TJ 

6 for each s inV do 



9 return min w(root(N), s); 



Theorem 6. Computing the softwired parsimony score of a rooted phylogenetic network N 
and a p-state character, for any p £ N, is fixed-parameter tractable if the parameter is the 
level of the network. 
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Proof. In Lemma [7j we proven that each biconnected component B of N contains only one 
root tb- We denote by BT{N) the graph obtained as follows: we create a node v r in BT{N) 
for each node r of N that is the root of at least one biconnected component of N, and an 
edge (v r ,v r >) in BT(N) if r' 7^ r and r' is contained in a biconnected component rooted 
at r. It is easy to see that BT(N) is connected. Moreover, it cannot contain any reticulation 
because of Lemma |8j Thus BT(N) is a tree on X. In the following we shall prove that 
PS SW (N, a) = min sg -p w(root(N), s) with w the weight function computed by Algorithm [2j 

Denote by PS sw (N,a,s) the minimum parsimony score for N and a, with the restriction 
that the root of N must be labelled with character state s. Let N r be the subnetwork 
comprising all biconnected components whose roots can be reached by directed paths from r. 
We will prove that PS sw (N r , a, s) = w(r, s) for any node r in V(N) associated to a node v r 
in BT(N). We prove this equality by induction on the height of v r , which is defined as the 
length of a longest path from v r to a leaf of BT(N). 

We begin by proving that the equality is true when the height of v r is 0. Suppose that r 
is the root of J different non trivial biconnected components and let Bj be one of these. Then 
we have that: 

PS sw (Bj,a,w,s) = min PS sw (Tj, a, w, s) = min PS sw (Tj,a,s) = PS sw (Bj,a,s), 

where the second equivalence holds because of Lemma [6j since w(v, s) is equal to zero for all 
nodes of Bj and s in V. Then, because of Lemma [8j we have that: 

w(r,s) = Y,B j eBr PSs ™( B 3> a ' w ' s ) = EB J eB r P5 '^( jB i' a ' s ) = PS S w(N r ,a,s), 
where B r is the set of biconnected components rooted at r. 

Suppose now that w{r,s) = PS sw (N r ,a, s) is true for all nodes v r of BT(N) with height 
at most h. We want to prove that this holds also for nodes with height h + 1. Let v r 
be such a node, r the associated node in iV and let Bj a biconnected component rooted 
at r. Let N r {Bj) denote the subnetwork of N r where edges not reachable from Bj are 
deleted. Then, when we loop through all switchings of Bj, we can use the same kind of 
reasoning as before to prove that PS sw (Bj,a,w,s) = PS sw (N r (Bj),a, s). The idea is that, 
once a subnetwork has been processed, its influence in the biconnected component above it 
is expressed using the w function. This implies that the claim holds, since PSgiu^Nj-^ (Xj s^j — 

We still need to prove the running time. Algorithm [l] has a running time of 0(p 2 \V(T)\). 
Moreover, for each biconnected component we call Algorithm [l] for at most 2 k trees with 
at most |V(JV)| nodes. Moreover, by Lemma |8j we have that the number of biconnected 
components of is at most \E(N)\. Then we have an overall complexity of 0(2 k p 2 \V(N)\ ■ 
|^(JV)|). This concludes the proof. □ 



6. Maximum parsimony in practice: integer linear programming 

We propose the following integer linear programming (ILP) formulation for computing the 
hardwired parsimony score of a phylogenetic network, with node set V and edge set E, and 
a p-state character a. All variables are binary. Variable x v s indicates whether or not node v 
has character state s and variable c e indicates if there is a change on edge e or not. For a 
leaf v, parameter a(v) is the given character state at v. Let V = {1, . . . ,p}. 
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min ^2 °e 



e£E 

s.t. x V)S = 1 for all v G V 

seS 

c e > — aJv,s for all e = (u,v) £ E, s £ V 

c e > — aJu,s f° r ai l e = (u,v) £ E, s £ V 

x v,a(v) = 1 f° r each leaf v 

c e £ {0, 1} for all e G S 

G {0, 1} for all v £ V, s £ V 



To see the correctness of the formulation, first observe that the first constraint ensures 
that each node is assigned exactly one character state. Now consider an edge e = (u, v) and 
suppose that u and v are assigned different states s and s'. Then x UtS / x VtS (and x UjS i / x VjS ') 
and hence the second and third constraint ensure that c e = 1. 

For the softwired parsimony score, we extend the ILP formulation as follows. In addition 
to the variables above, there is a binary variable y e indicating if edge e is switched "on" or 
"off" . A change on edge e is only counted if it is switched on. For each reticulation, exactly 
one incoming edge is switched on. 



min c e 

e£E 

S.t. ^ %v,s — 1 

seS 

Ce ^ %u,s -EvjS (1 Ve) 
Ce ^ %v,s %u,s (1 Ve) 

£ V(v,r) = 1 
ti:(»,r)eE 

Ve = 1 



X 



v,a(v) 



1 



c e ,y e £ {0,1} 
x v , s £ {0, 1} 



for all v £ V 

for all e = (u, v) £ E, s £ V 
for all e = (u, v) £ E, s £ V 

for each reticulation r 

for each tree-edge e 
for each leaf v 
for all e £ E 
for all v £ V, s £ V 



It follows from Lemma [4] that the optimum value of this ILP is equal to the softwired 
parsimony score of the given network and character. 

Note that the parsimony score of an alignment can be computed by solving the above ILP 
formulation for each column (character) separately, or combining them in a single ILP. Gaps 
in the alignment can be accommodated in the formulation by demanding that x v a r v \ = 1 
only for leaves v for which a(v) is not a gap. 

We have implemented both ILP formulation and made the resulting user-friendly software 
publicly available [10]. Experimental results with CPLEX 12.5 on a 2Ghz laptop are in 
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Number of 


Average 




Avera 


2;e computation time (s) 






1, i 1 A 1 1 


"nnmnpy of 


Hardwired PS 


Softwired PS 






reticulations 


2-state 


3-state 


4-state 


2-state 


3-state 


4-state 


Run 1 


50 


17.0 


0.0 


0.0 


0.1 


0.1 


0.1 


0.3 


Run 2 


100 


37.0 


0.0 


0.0 


0.2 


0.0 


0.1 


0.6 


Run 3 


150 


54.1 


0.0 


0.1 


0.6 


0.1 


0.2 


0.8 


Run 4 


200 


72.8 


0.0 


0.1 


1.1 


0.1 


0.4 


1.4 


Run 5 


250 


91.3 


0.0 


0.1 


3.5 


0.1 


0.4 


2.2 


Run 6 


300 


112.6 


0.0 


0.2 


5.2 


0.1 


0.6 


3.7 



Table 1. Time needed to compute hardwired and softwired parsimony scores 
in six test runs. For each run, an average is taken over 10 simulated networks 
with randomly assigned character states. 





Hardwired 


Softwired 


Complexity 


In P for p = 2 
NP-hard for p > 3 


NP-hard for p > 2 


Approximation 


jj-approx. for p = 3 
1.3438-approx. for p > 4 


no \X 1_e -approx. 
for any e > unless P = NP 


Parameterized by PS 


FPT 


NP-hard to decide if PS=1 


Parameterized by level 


N/A 


FPT 


Table 2. Summary o 


: the complexity of computing hardwired and softwired 



parsimony scores of phylogenetic networks. 



Table [T] Networks were simulated using Dendroscope [15] and character-states were assigned 
uniformly at random. 

For practical applications, parsimony scores have to be computed quickly since this com- 
putation needs to be repeated many times, for example when searching for a network with 
smallest parsimony score. Apparent from Table [T] is that parsimony scores can be computed 
very quickly using ILP for networks with up to 100-150 taxa and up to 50 reticulations. 
Moreover, parsimony scores can even be computed quickly for much larger networks in the 
case of binary and ternary characters. This is of interest because, in practice, many columns 
of an alignment might contain only two or three different symbols. Despite the theoretical 
differences in tractability of hardwired and softwired parsimony scores, their computation 
times using ILP do not differ much in these experiments. 

7. Conclusions and open problems 

We have clarified the distinction between two possible definitions of the parsimony score of a 
phylogenetic network, which we call the "softwired" and the "hardwired" parsimony score. We 
have shown that computing the hardwired parsimony score is, in various ways, more tractable 
than computing the softwired score, see Table [2] We have also shown that the intractability 
results still hold under several topological restrictions. A stimulating open question is to 
determine the (in)approximability and fixed-parameter tractability of computing the softwired 
parsimony score on a rooted network that is simultaneously binary and tree-child: this might 
be considerably more tractable than other versions of the problem. From a practical point of 
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view, we have shown that both the hardwired and softwired parsimony score can be computed 
efficently using ILP. It will be interesting to explore, in the spirit of studies such as pj2 ITB] . 
the extension of this work to the notoriously intractable "big parsimony" problem. 
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Appendix: Proofs 

Lemma [7| Let JV be a rooted phylogenetic network on X and let 5 be a biconnected 
component of N. Then B contains exactly one node without ancestors in B. 

Proof. Suppose there exist two roots in B, n and r?,- In a rooted network TV there always 
exists a directed path from the root of N to each node in N. Hence there exists a node v in 
N such that there is a simple directed path from v to n, and a simple directed path from v 
to T2- (Note that we do not exclude the possibility that v G {?T,r2}.) By merging these two 
paths we see that there is an undirected simple path P between r\ and T2 such that for at 
least one of r\ and r2 the edge of P incident to it is oriented towards it. We want to argue 
that all nodes and edges of P are also in B, which will contradict the assumption that r\ and 
r2 are both roots of B. In fact, it holds that if any two nodes u and v in B have a simple 
undirected path P between them, all nodes and edges of P are also in B. If this was not true, 
then P would contain some node not in B, and this in turn would mean that, in the journey 
from u to v, P would have to pass through some articulation node twice, contradicting its 
simplicity. Hence all nodes of P are in B, and by maximality all of the edges of P are too. □ 

Transforming degree-2 nodes (from Corollary [2]) . Although the hardwired parsimony 
score naturally extends to them, degree-2 nodes are not formally part of our phylogenetic 
network model. Fortunately, degree-2 nodes can simply be suppressed without altering the 
hardwired parsimony score. Unfortunately this may in turn create multi-edges which are 
likewise excluded from our definition. To deal with this, a multi-edge with multiplicity t > 2 
between two nodes u and v can be encoded within the degree restrictions of a phylogenetic 
network by using a specific gadget. Namely, group the edges into t' = [t/2\ pairs and for 
each pair Pi (1 < i < t') (i) delete the two edges concerned (ii) add two new nodes x%,yi and 
(iii) add the edges (u, Xi), (u,yi), (xi,yi), (xi,v), (yi,v). (If t is odd the remaining edge can 
simply remain intact). Again, this does not alter the hardwired parsimony score. In fact, 
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both transformations also leave the cut properties of the graph unchanged, which is important 
for the proof of Corollary [2] 

Theorem [5j For every constant e > there is no polynomial-time approximation algorithm 
that approximates PS SW (N, a) to a factor |X|3 _e ; where N is a rooted binary phylogenetic 
network on X and a is a binary character on X , unless P = NP. 

Proof. We reduce again from 3-SAT. Let, as before, B = (C, V) be an instance of 3-SAT. 
Let V = {vi, . . . ,v n } and F := /(n, e). We will describe a construction of a rooted binary 
phylogenetic network TV" and binary character a. The first part of the construction is es- 
sentially a binary version of the network constructed in the proof of Theorem [4| The main 
difference will be the construction of the so-called "zero-gadgets" . For ease of notation, we 
will create vertices with indegree-1 and outdegree-1, which could be suppressed, and we cre- 
ate reticulations with indegree greater than 2, which could be refined arbitrarily. Observe 
that neither suppressing indegree-1 and outdegree-1 vertices nor refining reticulations with 
indegree greater than 2 alters the softwired parsimony score. However, to simplify the proof 
we do not suppress or refine these vertices. Furthermore, we assume without loss of generality 
that each literal is contained in at least one clause. 

Our construction is as follows. We create a root p with two directed paths (p, a\, . . . , a2n) 
and (p, &i, . . . , &2n) leaving it. Then, for each variable Vi, we create a reticulation vertex which 
we will also call V{ and has reticulation edges (a2 n -2i+2, Wj), (&2i-l> ^j). Moreover, we create 
a reticulation vertex ->Vi with reticulation edges (a2 n -2i+i> ~^ v i)i (hi, ~~<Vi). The vertices Vi 
and ->Vi are called "literal vertices". Then, for each clause c, create F vertices c%, . . . ,cp, 
which we will call "clause vertices", and for each such clause vertex Cf, create an edge (cf, c'j) 
to a new leaf c'j with character state a(c'j) = 1 (the "clause leaves"). So, in total we have \C\F 
clause leaves. 

We now connect the literal vertices to the clause vertices. For each variable x and its 
negation ->x, we do the following. Suppose that x is in T clauses, corresponding to F ■ 
T clause vertices, c\, . . . , c%' T . Create a directed path (x, d\, . . . , d^' T ) and edges (d^c™.) 
for k = 1, ... ,F ■ T. Similarly, if —>x is in O clauses with F ■ O corresponding clause ver- 
tices (7*, . . . , 7£ ), we create a directed path (~<x, S^., . . . , 5% ' e ) and edges (S£, 7^) for k = 
1, . . . ,F ■ O. Now, for each k £ {1, . . . , F ■ T} and for each k £ {1, . . . , F ■ ©}, we create 
the following "zero-gadget". Let u be the parent of c\ that is reachable from x (hence, in 
the first iteration, u = d^). Replace the edge (u, c^) by a directed path (u,ui, . . . , up, c^). 
Similarly, let p be the parent of 7^ that is reachable from -^x (initially, p = 5%), and replace 
the edge (p, 7^) by a directed path p, pi, . . . , pp,^^. Then, for / = 1, . . . , F, create a new 
reticulation 27, with reticulation edges (uf,Zf) and (pf,Zf), and an edge (zf,z'f) to a new 
leaf z'j with character state ce(z'j) = 0. 

See Figure [8] for an example of the construction for F = 1. For larger F, the construction 
is similar but with more copies of each clause vertex, and more copies of each zero-gadget. 

The constructed network N has \C\F clause leaves (all having character state 1) and at 
most |y||C| 2 -F 3 leaves for the zero-gadgets (all having character state 0). Hence, the total 
number of leaves is at most \C\F + |F||C| 2 F 3 . 

We will show that if (C, V) is satisfiable, PS(N SW , a) = 1 and that if (C, V) is not satisfiable, 
PS(N SW , a) > F. We will use the following definitions. For two vertices u and v, we say that v 
is a tree- descendant of u if v is reachable from u by a directed path that does not contain any 
reticulations apart from possibly u. In particular, each vertex is a tree-descendant of itself. 
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Figure 8. An encoding of the 3-SAT instance (x V y) A (x V ->y V z) A (x V 
-<z) A (-ix V —<z) as described in Theorem [5j with F = 1. The zero-gadgets are 
indicated in grey. 



Furthermore, given an extension r of a to V(iV), we say that there is a change at vertex v if 
the character state of v is different from the character states of all its parents. The number of 
changes of network N and extension r is the number of vertices at which there is a change. 
It follows from Lemma [4] that the softwired parsimony score of a network N and character a 
is equal to the minimum number of changes over all possible extensions r of a to V(N). 

First suppose that (C, V) is satisfiable. Then, given a satisfying truth assignment, we 
can assign character states as follows. All vertices on the path (p, oi, . . . , a2 n ) and all clause 
vertices c/ receive state 1. All vertices on the path (pi, . . . , i>2n) an d all reticulations Zf of 
zero-gadgets receive state 0. For each variable x that is set to true by the truth assignment, 
we give state 1 to all tree-descendants of literal vertex x and state to all tree-descendants 
of literal vertex -ijc. Similarly, for each variable x that is set to false by the truth assignment, 
we give state to all tree-descendants of literal vertex x and state 1 to all tree-descendants 
of literal vertex ->x. This concludes the assignment of character states. Now observe the 
following. Consider a clause c and a corresponding clause vertex Cf, which has state 1. 
Since c is satisfied by the truth assignment, at least one parent of cj has also state 1. Hence, 
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there are no changes at the clause vertices. Moreover, for each reticulation Zf of a zero- 
gadget (which has state 0), there is at least one parent that also has state because for each 
variable x either all tree-descendants of x or all tree-descendants of ->x have state 0. Using 
these observations, it can easily be checked that the only change is at b\. Hence, if (C, V) is 
satisfiable, PS(N sw ,a) = 1. 

Next, we show that if (C, V) is not satisfiable, PS(N sw ,a) > F. We do this by assuming 
that PS(N sw ,a) < F and showing that this implies that (C,V) is satisfiable. Let r be an 
extension of a to V(N) with less than F changes. For a positive literal x and clause vertex c*j 
for a clause containing x, let P(x,c%.) denote the directed path from d k x to c k (with d k x as 
defined in the construction of N). Similarly, for a negative literal -*x and clause vertex 7^ 
for a clause containing -ix, let P(->x, 7^) denote the directed path from 6£ to 7^ (with 8 X as 
defined in the construction of N). Moreover, for any literal £ (of the form x or ->x) and clause 
vertex Cf for a clause containing £, let P'(£, Cf) denote path P(£, Cf) excluding its first vertex. 
We compute a truth assignment as follows. A variable x is set to true if and only if for some 
clause vertex c k for a clause containing x holds that all vertices on the path P(x, c k ) have 
state 1. We now prove that the obtained truth assignment is a satisfying truth assignment. 
Assume that a certain clause c is not satisfied. Consider a clause vertex cf corresponding to 
clause c. Observe that, for two different clause vertices c/ 15 c/ 2 corresponding to clause c and 
for any two literals £±,£2 contained in clause c, the paths P'(£i, c/J, P'(£2, c/ 2 ) are vertex- 
disjoint. Hence, since r has less than F changes, there exists at least one clause vertex Cf 
corresponding to clause c for which there are no changes at c'j or at any vertex on a directed 
path P'(£,Cf) for any literal £ contained in clause c. Since c'j has state 1, it follows that Cf 
has state 1 , and hence that at least one parent of cj has state 1 , and hence that there exists 
at least one literal £ contained in clause c such that all vertices on the path P(£,Cf) have 
state 1. If £ is of the form x (a positive literal), then this immediately implies that £ is 
set to true contradicting the assumption that clause c is not satisfied. Now consider the 
case that £ is of the form —>x (a negative literal). Let k be such that 7^ = Cf. We have 
shown that all vertices on the path P(£,Cf) from 5 X to 7^ = Cf have state 1. For every 
k € {1, . . . , F ■ T}, there is a zero-gadget for x, k and k. Each such zero-gadget contains a 
directed path (ui, . . . , uj?) on P(x, c k ) and a directed path m, . . . , //f on P(-<x, 7^). Since all 
vertices on the path P(^x,j x k) have state 1, and there are less than F changes, at least one 
vertex of the path (m, . . . ,uf) has state 0. Hence, at least one vertex on P(x, c£) has state 
for all k € {1, . . . , F ■ T}. It follows that x is set to false and hence that literal £ = ->x is set 
to true, contradicting the assumption that clause c is not satisfied. Therefore, we have shown 
that, if (C, V) is not satisfiable, PS(N SW , a) > F. 

It remains to describe how to choose F = f(n,e) such that |X|3~ e < /(n, e). Recall that 
\X\ < \C\f(n,e) + \V\\C\ 2 f{n,ef. Then, with n = \V\ and recalling that \C\ = 0(n 3 ), we 
can bound this by \X\ < n 8 f(n,e) 3 for sufficiently large n. Hence, it is enough to show that 
n 8( M/(n, e) 3 ^- e 1 < f(n, e). Taking f(n, e) = n^, we need 8(| - e) + 3(| - e)g(e) < g(e). 
Hence, it is sufficient to take g(e) = \^~\ . □ 
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